Skip to content

Push UserToolSource semantic validation onto the pydantic model#22615

Open
mvdbeek wants to merge 2 commits intogalaxyproject:devfrom
mvdbeek:more-validation-user-tool-sources
Open

Push UserToolSource semantic validation onto the pydantic model#22615
mvdbeek wants to merge 2 commits intogalaxyproject:devfrom
mvdbeek:more-validation-user-tool-sources

Conversation

@mvdbeek
Copy link
Copy Markdown
Member

@mvdbeek mvdbeek commented May 1, 2026

Takes the validation in #22611 and moves it into the pydantic models so all consumers benefit from these rules.

Move the id pattern, container shape, citation DOI/BibTeX checks, blank required-field checks, undeclared inputs.<name> references in shell_command / configfiles[*].content, and dataset/collection output discovery requirements onto field/model validators on _DynamicToolSourceBase, UserToolSource, and Citation. Doing so benefits API callers of DynamicUnprivilegedToolCreatePayload, not just CustomToolAgent.

Tool ids accept [a-z][a-z0-9_-]* (hyphens allowed). Outputs of a tool with shell_command must claim their bytes via from_work_dir or discover_datasets; the prior "name appears in command" heuristic produced both false positives and false negatives.

format_validation_errors distills ValidationError to friendly bullet strings for reuse by the agent and API layers. CustomToolAgent walks the cause chain on UnexpectedModelBehavior to surface the underlying ValidationError as a low-confidence response with bullet list, and logs at debug since validation failure is the expected path while a user is editing.

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

@github-project-automation github-project-automation Bot moved this to Needs Review in Galaxy Dev - weeklies May 1, 2026
@mvdbeek mvdbeek requested review from dannon and jmchilton and removed request for dannon May 1, 2026 15:52
@github-actions github-actions Bot added this to the 26.1 milestone May 1, 2026
Move the id pattern, container shape, citation DOI/BibTeX checks, blank
required-field checks, undeclared `inputs.<name>` references in
`shell_command` / `configfiles[*].content`, and dataset/collection output
discovery requirements onto field/model validators on
`_DynamicToolSourceBase`, `UserToolSource`, and `Citation`. Doing so
benefits API callers of `DynamicUnprivilegedToolCreatePayload`, not just
`CustomToolAgent`.

Tool ids accept `[a-z][a-z0-9_-]*` (hyphens allowed). Outputs of a tool
with `shell_command` must claim their bytes via `from_work_dir` or
`discover_datasets`; the prior "name appears in command" heuristic
produced both false positives and false negatives.

`format_validation_errors` distills `ValidationError` to friendly bullet
strings for reuse by the agent and API layers. `CustomToolAgent` walks
the cause chain on `UnexpectedModelBehavior` to surface the underlying
`ValidationError` as a low-confidence response with bullet list, and
logs at debug since validation failure is the expected path while a
user is editing.
@mvdbeek mvdbeek force-pushed the more-validation-user-tool-sources branch from 0b5ed56 to 0258589 Compare May 1, 2026 16:07
@mvdbeek mvdbeek requested a review from dannon May 1, 2026 16:57
Comment thread lib/galaxy/tool_util_models/__init__.py Outdated
# references allow optional registry path segments and an optional tag.
CONTAINER_PREFIXES: Tuple[str, ...] = ("quay.io/biocontainers/", "docker://", "oras://")
DOCKER_IMAGE_RE = re.compile(r"^[a-zA-Z0-9][a-zA-Z0-9._-]*(/[a-zA-Z0-9._-]+)*(:[\w][\w.-]*)?$")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a linter concern not a tool source validation concern. Really -.9 on this being here. This isn't a YAML tool concern it is an abstract ToolSource concern. Create a lint rule and require it to be met for Galaxy User Defined tools behind some flag that defaults to that behavior.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3c97502 any better ?

dannon added a commit to dannon/galaxy that referenced this pull request May 1, 2026
Builds on galaxyproject#22615's integrated UserToolSource validation. The producer
agent runs with output_retries=0 so the reflection loop owns the retry
and can pass a structured error list back to the prompt rather than
pydantic-ai's generic 'try again' message.

Two opt-in loops on top of pydantic validation:

- Validator-driven retry (default on): if the producer's output fails
  validation, _produce_tool returns the formatted error list, and
  process() re-calls the producer once with those errors prefixed to
  the prompt. Cap of one retry; if it still fails, the agent returns
  a low-confidence validation_failed response.
- Quality critic + refine (default off): an LLM critic agent reviews
  the validated tool for clarity (description, labels, help text) and
  idiomaticity (defaults, exposed options, container choice) -- the
  fuzzy dimensions pydantic can't see. If the critic flags significant
  issues (should_refine=true), the producer is re-rolled once with
  the critique. Cap of one refine; if refinement breaks validation,
  the original tool is kept rather than ship something worse.

Both gates live under inference_services.custom_tool:
validator_retry_enabled (default true) and quality_critic_enabled
(default false). Defaulting the critic off keeps the cost neutral for
deployments that don't opt in -- the critic doubles tool-creation
latency / spend when enabled.

The 170-line process() method got factored into named helpers
(capability check, structured-output extraction, validation-failed
response, success response, model-error handlers) to keep the
reflection control flow readable.
dannon added a commit to dannon/galaxy that referenced this pull request May 1, 2026
Mirrors galaxyproject#22615's relaxed output discovery -- a UserToolSource with a
shell_command must claim its outputs via from_work_dir or
discover_datasets, but the prompt only mentioned the former. The
producer was occasionally generating tools that pattern-matched on
output names in the command, which the integrated validator now
correctly rejects.

Pre-commit's markdown formatter also normalized the YAML example
indentation in the same file.
mvdbeek added a commit to mvdbeek/galaxy that referenced this pull request May 1, 2026
Reviewer feedback on galaxyproject#22615: the container regex/prefix shape check is a
linter concern, not a tool source validation concern, and applies to any
ToolSource (not just YAML). Pull the shape check off
`UserToolSource._check_container_shape` and surface it as a new
`ContainerImageShape` linter under `galaxy.tool_util.linters.containers`,
discovered automatically by the existing lint framework
(`lint_tool_types = ["*"]`). The linter walks
`tool_source.parse_requirements()` and warns on identifiers that do not
match a recognized prefix or `<image>[:<tag>]` shape.

For Galaxy User Defined tools, enforcement is unconditional: the
unprivileged tools API and the `CustomToolAgent` both call a new
`lint_user_tool_source` helper after pydantic validation succeeds and
block creation if WARN+ messages are returned.

The not-empty/whitespace check on `container` stays on the pydantic
model — that is a basic field validity check rather than a "shape"
concern. Container-shape tests move from
`test_user_tool_source_validation.py` to a new
`test_container_shape_lint.py` covering both the linter and the helper.
Reviewer feedback on galaxyproject#22615: the container regex/prefix shape check is a
linter concern, not a tool source validation concern, and applies to any
ToolSource (not just YAML). Pull the shape check off
`UserToolSource._check_container_shape` and surface it as a new
`ContainerImageShape` linter under `galaxy.tool_util.linters.containers`,
discovered automatically by the existing lint framework
(`lint_tool_types = ["*"]`). The linter walks
`tool_source.parse_requirements()` and warns on identifiers that do not
match a recognized prefix or `<image>[:<tag>]` shape.

For Galaxy User Defined tools, enforcement is unconditional: the
unprivileged tools API and the `CustomToolAgent` both call a new
`lint_user_tool_source` helper after pydantic validation succeeds and
block creation if WARN+ messages are returned.

The not-empty/whitespace check on `container` stays on the pydantic
model — that is a basic field validity check rather than a "shape"
concern. Container-shape tests move from
`test_user_tool_source_validation.py` to a new
`test_container_shape_lint.py` covering both the linter and the helper.
@mvdbeek mvdbeek force-pushed the more-validation-user-tool-sources branch from 25ca402 to 3c97502 Compare May 1, 2026 19:21
dannon added a commit to dannon/galaxy that referenced this pull request May 1, 2026
Builds on galaxyproject#22615's integrated UserToolSource validation. The producer
agent runs with output_retries=0 so the reflection loop owns the retry
and can pass a structured error list back to the prompt rather than
pydantic-ai's generic 'try again' message.

Two opt-in loops on top of pydantic validation:

- Validator-driven retry (default on): if the producer's output fails
  validation, _produce_tool returns the formatted error list, and
  process() re-calls the producer once with those errors prefixed to
  the prompt. Cap of one retry; if it still fails, the agent returns
  a low-confidence validation_failed response.
- Quality critic + refine (default off): an LLM critic agent reviews
  the validated tool for clarity (description, labels, help text) and
  idiomaticity (defaults, exposed options, container choice) -- the
  fuzzy dimensions pydantic can't see. If the critic flags significant
  issues (should_refine=true), the producer is re-rolled once with
  the critique. Cap of one refine; if refinement breaks validation,
  the original tool is kept rather than ship something worse.

Both gates live under inference_services.custom_tool:
validator_retry_enabled (default true) and quality_critic_enabled
(default false). Defaulting the critic off keeps the cost neutral for
deployments that don't opt in -- the critic doubles tool-creation
latency / spend when enabled.

The 170-line process() method got factored into named helpers
(capability check, structured-output extraction, validation-failed
response, success response, model-error handlers) to keep the
reflection control flow readable.
dannon added a commit to dannon/galaxy that referenced this pull request May 1, 2026
Mirrors galaxyproject#22615's relaxed output discovery -- a UserToolSource with a
shell_command must claim its outputs via from_work_dir or
discover_datasets, but the prompt only mentioned the former. The
producer was occasionally generating tools that pattern-matched on
output names in the command, which the integrated validator now
correctly rejects.

Pre-commit's markdown formatter also normalized the YAML example
indentation in the same file.
mvdbeek added a commit that referenced this pull request May 3, 2026
Reviewer feedback on #22615: the container regex/prefix shape check is a
linter concern, not a tool source validation concern, and applies to any
ToolSource (not just YAML). Pull the shape check off
`UserToolSource._check_container_shape` and surface it as a new
`ContainerImageShape` linter under `galaxy.tool_util.linters.containers`,
discovered automatically by the existing lint framework
(`lint_tool_types = ["*"]`). The linter walks
`tool_source.parse_requirements()` and warns on identifiers that do not
match a recognized prefix or `<image>[:<tag>]` shape.

For Galaxy User Defined tools, enforcement is unconditional: the
unprivileged tools API and the `CustomToolAgent` both call a new
`lint_user_tool_source` helper after pydantic validation succeeds and
block creation if WARN+ messages are returned.

The not-empty/whitespace check on `container` stays on the pydantic
model — that is a basic field validity check rather than a "shape"
concern. Container-shape tests move from
`test_user_tool_source_validation.py` to a new
`test_container_shape_lint.py` covering both the linter and the helper.
mvdbeek pushed a commit that referenced this pull request May 3, 2026
Builds on #22615's integrated UserToolSource validation. The producer
agent runs with output_retries=0 so the reflection loop owns the retry
and can pass a structured error list back to the prompt rather than
pydantic-ai's generic 'try again' message.

Two opt-in loops on top of pydantic validation:

- Validator-driven retry (default on): if the producer's output fails
  validation, _produce_tool returns the formatted error list, and
  process() re-calls the producer once with those errors prefixed to
  the prompt. Cap of one retry; if it still fails, the agent returns
  a low-confidence validation_failed response.
- Quality critic + refine (default off): an LLM critic agent reviews
  the validated tool for clarity (description, labels, help text) and
  idiomaticity (defaults, exposed options, container choice) -- the
  fuzzy dimensions pydantic can't see. If the critic flags significant
  issues (should_refine=true), the producer is re-rolled once with
  the critique. Cap of one refine; if refinement breaks validation,
  the original tool is kept rather than ship something worse.

Both gates live under inference_services.custom_tool:
validator_retry_enabled (default true) and quality_critic_enabled
(default false). Defaulting the critic off keeps the cost neutral for
deployments that don't opt in -- the critic doubles tool-creation
latency / spend when enabled.

The 170-line process() method got factored into named helpers
(capability check, structured-output extraction, validation-failed
response, success response, model-error handlers) to keep the
reflection control flow readable.
mvdbeek pushed a commit that referenced this pull request May 3, 2026
Mirrors #22615's relaxed output discovery -- a UserToolSource with a
shell_command must claim its outputs via from_work_dir or
discover_datasets, but the prompt only mentioned the former. The
producer was occasionally generating tools that pattern-matched on
output names in the command, which the integrated validator now
correctly rejects.

Pre-commit's markdown formatter also normalized the YAML example
indentation in the same file.
mvdbeek added a commit that referenced this pull request May 4, 2026
Reviewer feedback on #22615: the container regex/prefix shape check is a
linter concern, not a tool source validation concern, and applies to any
ToolSource (not just YAML). Pull the shape check off
`UserToolSource._check_container_shape` and surface it as a new
`ContainerImageShape` linter under `galaxy.tool_util.linters.containers`,
discovered automatically by the existing lint framework
(`lint_tool_types = ["*"]`). The linter walks
`tool_source.parse_requirements()` and warns on identifiers that do not
match a recognized prefix or `<image>[:<tag>]` shape.

For Galaxy User Defined tools, enforcement is unconditional: the
unprivileged tools API and the `CustomToolAgent` both call a new
`lint_user_tool_source` helper after pydantic validation succeeds and
block creation if WARN+ messages are returned.

The not-empty/whitespace check on `container` stays on the pydantic
model — that is a basic field validity check rather than a "shape"
concern. Container-shape tests move from
`test_user_tool_source_validation.py` to a new
`test_container_shape_lint.py` covering both the linter and the helper.
mvdbeek pushed a commit that referenced this pull request May 4, 2026
Builds on #22615's integrated UserToolSource validation. The producer
agent runs with output_retries=0 so the reflection loop owns the retry
and can pass a structured error list back to the prompt rather than
pydantic-ai's generic 'try again' message.

Two opt-in loops on top of pydantic validation:

- Validator-driven retry (default on): if the producer's output fails
  validation, _produce_tool returns the formatted error list, and
  process() re-calls the producer once with those errors prefixed to
  the prompt. Cap of one retry; if it still fails, the agent returns
  a low-confidence validation_failed response.
- Quality critic + refine (default off): an LLM critic agent reviews
  the validated tool for clarity (description, labels, help text) and
  idiomaticity (defaults, exposed options, container choice) -- the
  fuzzy dimensions pydantic can't see. If the critic flags significant
  issues (should_refine=true), the producer is re-rolled once with
  the critique. Cap of one refine; if refinement breaks validation,
  the original tool is kept rather than ship something worse.

Both gates live under inference_services.custom_tool:
validator_retry_enabled (default true) and quality_critic_enabled
(default false). Defaulting the critic off keeps the cost neutral for
deployments that don't opt in -- the critic doubles tool-creation
latency / spend when enabled.

The 170-line process() method got factored into named helpers
(capability check, structured-output extraction, validation-failed
response, success response, model-error handlers) to keep the
reflection control flow readable.
mvdbeek pushed a commit that referenced this pull request May 4, 2026
Mirrors #22615's relaxed output discovery -- a UserToolSource with a
shell_command must claim its outputs via from_work_dir or
discover_datasets, but the prompt only mentioned the former. The
producer was occasionally generating tools that pattern-matched on
output names in the command, which the integrated validator now
correctly rejects.

Pre-commit's markdown formatter also normalized the YAML example
indentation in the same file.
mvdbeek added a commit that referenced this pull request May 6, 2026
Reviewer feedback on #22615: the container regex/prefix shape check is a
linter concern, not a tool source validation concern, and applies to any
ToolSource (not just YAML). Pull the shape check off
`UserToolSource._check_container_shape` and surface it as a new
`ContainerImageShape` linter under `galaxy.tool_util.linters.containers`,
discovered automatically by the existing lint framework
(`lint_tool_types = ["*"]`). The linter walks
`tool_source.parse_requirements()` and warns on identifiers that do not
match a recognized prefix or `<image>[:<tag>]` shape.

For Galaxy User Defined tools, enforcement is unconditional: the
unprivileged tools API and the `CustomToolAgent` both call a new
`lint_user_tool_source` helper after pydantic validation succeeds and
block creation if WARN+ messages are returned.

The not-empty/whitespace check on `container` stays on the pydantic
model — that is a basic field validity check rather than a "shape"
concern. Container-shape tests move from
`test_user_tool_source_validation.py` to a new
`test_container_shape_lint.py` covering both the linter and the helper.
mvdbeek pushed a commit that referenced this pull request May 6, 2026
Builds on #22615's integrated UserToolSource validation. The producer
agent runs with output_retries=0 so the reflection loop owns the retry
and can pass a structured error list back to the prompt rather than
pydantic-ai's generic 'try again' message.

Two opt-in loops on top of pydantic validation:

- Validator-driven retry (default on): if the producer's output fails
  validation, _produce_tool returns the formatted error list, and
  process() re-calls the producer once with those errors prefixed to
  the prompt. Cap of one retry; if it still fails, the agent returns
  a low-confidence validation_failed response.
- Quality critic + refine (default off): an LLM critic agent reviews
  the validated tool for clarity (description, labels, help text) and
  idiomaticity (defaults, exposed options, container choice) -- the
  fuzzy dimensions pydantic can't see. If the critic flags significant
  issues (should_refine=true), the producer is re-rolled once with
  the critique. Cap of one refine; if refinement breaks validation,
  the original tool is kept rather than ship something worse.

Both gates live under inference_services.custom_tool:
validator_retry_enabled (default true) and quality_critic_enabled
(default false). Defaulting the critic off keeps the cost neutral for
deployments that don't opt in -- the critic doubles tool-creation
latency / spend when enabled.

The 170-line process() method got factored into named helpers
(capability check, structured-output extraction, validation-failed
response, success response, model-error handlers) to keep the
reflection control flow readable.
mvdbeek pushed a commit that referenced this pull request May 6, 2026
Mirrors #22615's relaxed output discovery -- a UserToolSource with a
shell_command must claim its outputs via from_work_dir or
discover_datasets, but the prompt only mentioned the former. The
producer was occasionally generating tools that pattern-matched on
output names in the command, which the integrated validator now
correctly rejects.

Pre-commit's markdown formatter also normalized the YAML example
indentation in the same file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Needs Review

Development

Successfully merging this pull request may close these issues.

2 participants